Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add pd.asof_merge #13358

Closed
wants to merge 1 commit into from
Closed

ENH: add pd.asof_merge #13358

wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 3, 2016

closes #1870
xref #2941

Here is a notebook of example usage and timings

@jreback jreback added Enhancement Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 3, 2016
@jreback jreback added this to the 0.18.2 milestone Jun 3, 2016
@jreback
Copy link
Contributor Author

jreback commented Jun 3, 2016

@sinhrks @jorisvandenbossche @wesm

API question. now we have pd.merge (already existing), pd.ordered_merge (already existing) and pd.asof_merge. Makes sense to keep them separate as the take somewhat different args, but this IS adding to the namespace a bit.

raise MergeError("incompatible merge keys, "
"must be the same type")

# no duplicates on rhs join keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the function could handle duplicates automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its easy, my only question is this too magical. IOW would a user expect to get the dups? (e.g. it will work right now w/dups)

Copy link
Contributor Author

@jreback jreback Jun 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed to auto-deduplicate right if needed (all internal now).

@codecov-io
Copy link

codecov-io commented Jun 3, 2016

Current coverage is 84.32%

Merging #13358 into master will increase coverage by 0.03%

@@             master     #13358   diff @@
==========================================
  Files           138        138          
  Lines         50929      51058   +129   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42926      43055   +129   
  Misses         8003       8003          
  Partials          0          0          

Powered by Codecov. Last updated by 9670b31...4592fa2

@jreback jreback force-pushed the asof branch 2 times, most recently from b9ff55d to 5b4f3d8 Compare June 3, 2016 20:10
@tacaswell
Copy link
Contributor

I think adding to the name space (with in reason) is better than shoe horning partially overlapping APIs together. Having args/kwargs which are not used / disallowed / have different meaning based on the value of other input should be avoided.

try:
rhs = right.iloc[rby.indices[key]]
except KeyError:
# key doesn't exist in left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't the key exist in the left if the failure is from a lookup on the right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ticker for a quotes (looks like I am bugging on it ATM

In [1]: trades = pd.read_csv('pandas/tools/tests/data/trades.csv')

In [2]: trades.time = pd.to_datetime(trades.time)

In [3]: quotes = pd.read_csv('pandas/tools/tests/data/quotes.csv')

In [4]: quotes.time = pd.to_datetime(quotes.time)

In [5]: pd.asof_merge(trades, quotes, on='time', by='ticker')
Out[5]: 
                      time ticker   price  quantity marketCenter     bid     ask
0  2016-05-25 13:30:00.023   MSFT   51.95        75       NASDAQ   51.95   51.95
1  2016-05-25 13:30:00.038   MSFT   51.95       155       NASDAQ   51.95   51.95
2  2016-05-25 13:30:00.048   GOOG  720.77       100       NASDAQ  720.50  720.93
3  2016-05-25 13:30:00.048   GOOG  720.92       100       NASDAQ  720.50  720.93
4  2016-05-25 13:30:00.048   GOOG  720.93       200       NASDAQ  720.50  720.93
5  2016-05-25 13:30:00.048   GOOG  720.93       300       NASDAQ  720.50  720.93
6  2016-05-25 13:30:00.048   GOOG  720.93       600       NASDAQ  720.50  720.93
7  2016-05-25 13:30:00.048   GOOG  720.93        44       NASDAQ  720.50  720.93
8  2016-05-25 13:30:00.074   AAPL   98.67    478343       NASDAQ     NaN     NaN
9  2016-05-25 13:30:00.075   AAPL   98.67    478343       NASDAQ   98.55   98.56
10 2016-05-25 13:30:00.075   AAPL   98.66         6       NASDAQ   98.55   98.56
11 2016-05-25 13:30:00.075   AAPL   98.65        30       NASDAQ   98.55   98.56
12 2016-05-25 13:30:00.075   AAPL   98.65        75       NASDAQ   98.55   98.56
13 2016-05-25 13:30:00.075   AAPL   98.65        20       NASDAQ   98.55   98.56
14 2016-05-25 13:30:00.075   AAPL   98.65        35       NASDAQ   98.55   98.56
15 2016-05-25 13:30:00.075   AAPL   98.65        10       NASDAQ   98.55   98.56
16 2016-05-25 13:30:00.075   AAPL   98.55         6         ARCA   98.55   98.56
17 2016-05-25 13:30:00.075   AAPL   98.55         6         ARCA   98.55   98.56
18 2016-05-25 13:30:00.076   AAPL   98.56      1000         ARCA   98.55   98.56
19 2016-05-25 13:30:00.076   AAPL   98.56       200         ARCA   98.55   98.56
20 2016-05-25 13:30:00.076   AAPL   98.56       300         ARCA   98.55   98.56
21 2016-05-25 13:30:00.076   AAPL   98.56       400         ARCA   98.55   98.56
22 2016-05-25 13:30:00.076   AAPL   98.56       600         ARCA   98.55   98.56
23 2016-05-25 13:30:00.076   AAPL   98.56       200         ARCA   98.55   98.56
24 2016-05-25 13:30:00.078   MSFT   51.95       783       NASDAQ   51.92   51.95
25 2016-05-25 13:30:00.078   MSFT   51.95       100       NASDAQ   51.92   51.95
26 2016-05-25 13:30:00.078   MSFT   51.95       100       NASDAQ   51.92   51.95

In [6]: pd.asof_merge(trades, quotes[quotes.ticker!='MSFT'], on='time', by='ticker')
IndexError: indices are out-of-bounds
In [7]: quotes[quotes.ticker!='MSFT']
Out[7]: 
                      time ticker     bid     ask
0  2016-05-25 13:30:00.023   GOOG  720.50  720.93
3  2016-05-25 13:30:00.048   GOOG  720.50  720.93
4  2016-05-25 13:30:00.048   GOOG  720.50  720.93
5  2016-05-25 13:30:00.048   GOOG  720.50  720.93
6  2016-05-25 13:30:00.048   GOOG  720.50  720.93
7  2016-05-25 13:30:00.072   GOOG  720.50  720.88
8  2016-05-25 13:30:00.075   AAPL   98.55   98.56
9  2016-05-25 13:30:00.076   AAPL   98.55   98.56
10 2016-05-25 13:30:00.076   AAPL   98.55   98.56
11 2016-05-25 13:30:00.076   AAPL   98.55   98.56

Copy link
Contributor

@chrisaycock chrisaycock Jun 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have NaN in the resulting DataFrame when the right misses a key, just like in a regular left merge().

(This should be part of the regression tests as well, and I totally forgot about it earlier when I came-up with examples. Sorry about that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [5]: pd.asof_merge(trades, quotes[quotes.ticker!='MSFT'], on='time', by='ticker')
Out[5]: 
                      time ticker   price  quantity marketCenter     bid     ask
0  2016-05-25 13:30:00.023   MSFT   51.95        75       NASDAQ     NaN     NaN
1  2016-05-25 13:30:00.038   MSFT   51.95       155       NASDAQ     NaN     NaN
2  2016-05-25 13:30:00.048   GOOG  720.77       100       NASDAQ  720.50  720.93
3  2016-05-25 13:30:00.048   GOOG  720.92       100       NASDAQ  720.50  720.93
4  2016-05-25 13:30:00.048   GOOG  720.93       200       NASDAQ  720.50  720.93
5  2016-05-25 13:30:00.048   GOOG  720.93       300       NASDAQ  720.50  720.93
6  2016-05-25 13:30:00.048   GOOG  720.93       600       NASDAQ  720.50  720.93
7  2016-05-25 13:30:00.048   GOOG  720.93        44       NASDAQ  720.50  720.93
8  2016-05-25 13:30:00.074   AAPL   98.67    478343       NASDAQ     NaN     NaN
9  2016-05-25 13:30:00.075   AAPL   98.67    478343       NASDAQ   98.55   98.56
10 2016-05-25 13:30:00.075   AAPL   98.66         6       NASDAQ   98.55   98.56
11 2016-05-25 13:30:00.075   AAPL   98.65        30       NASDAQ   98.55   98.56
12 2016-05-25 13:30:00.075   AAPL   98.65        75       NASDAQ   98.55   98.56
13 2016-05-25 13:30:00.075   AAPL   98.65        20       NASDAQ   98.55   98.56
14 2016-05-25 13:30:00.075   AAPL   98.65        35       NASDAQ   98.55   98.56
15 2016-05-25 13:30:00.075   AAPL   98.65        10       NASDAQ   98.55   98.56
16 2016-05-25 13:30:00.075   AAPL   98.55         6         ARCA   98.55   98.56
17 2016-05-25 13:30:00.075   AAPL   98.55         6         ARCA   98.55   98.56
18 2016-05-25 13:30:00.076   AAPL   98.56      1000         ARCA   98.55   98.56
19 2016-05-25 13:30:00.076   AAPL   98.56       200         ARCA   98.55   98.56
20 2016-05-25 13:30:00.076   AAPL   98.56       300         ARCA   98.55   98.56
21 2016-05-25 13:30:00.076   AAPL   98.56       400         ARCA   98.55   98.56
22 2016-05-25 13:30:00.076   AAPL   98.56       600         ARCA   98.55   98.56
23 2016-05-25 13:30:00.076   AAPL   98.56       200         ARCA   98.55   98.56
24 2016-05-25 13:30:00.078   MSFT   51.95       783       NASDAQ     NaN     NaN
25 2016-05-25 13:30:00.078   MSFT   51.95       100       NASDAQ     NaN     NaN
26 2016-05-25 13:30:00.078   MSFT   51.95       100       NASDAQ     NaN     NaN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4588 reared its ugly head, so had to 'fix' it.

@jreback jreback force-pushed the asof branch 2 times, most recently from 6344d19 to 0cf6339 Compare June 3, 2016 22:12
@jreback jreback changed the title ENH: add pd.asof_merge, closes #1870 ENH: add pd.asof_merge Jun 3, 2016
tolerance: integer or Timedelta, optional, default None
select asof tolerance within this range, must be compatible
to the merge index.
strict: boolean, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I found this keyword name counter-intuitive. More verbose, but would something like allow_exact_matches=True be better?

Copy link
Contributor

@chrisaycock chrisaycock Jun 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"strict" means "strictly less-than", as opposed to "less-than-or-equal-to". I agree that the parameter's explanation could be clearer.

Copy link
Contributor Author

@jreback jreback Jun 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about one of these:

  1. strict='less_than_or_equal' | 'less_than'
  2. match='less_than_or_equal' | 'less_than'
  3. match_less_that_or_equal=True | False
  4. allow_exact_matches=True | False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyone have thoughts on strict -> new name? (from above)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 6, 2016

@jreback Just a general comment: for me (as a not financial guy :-)), the current explanation of the method (and also the examples) is not really clarifying on what it exactly does .. (at a first read, without looking at the code, which a user should not have to do). So we will have to do a better job explaining it I think.

Some things I was thinking:

  • 'asof' is not a well-known term I think, so we need to explain it here as well (you should not have to look at the .asof() docstring to know what it means.
  • maybe a more dummy example would also help? (the current example is rather detailed (milliseconds, ..), making it more difficult to grasp in an glance what it does) But can certainly keep the current example as an actual practical example!

Will take a more detailed look myself later.

@jreback jreback force-pushed the asof branch 2 times, most recently from f346fa0 to 961ac76 Compare June 6, 2016 15:06
@wesm
Copy link
Member

wesm commented Jun 8, 2016

@jreback I think this needs to be a new API. I think adding asof_merge to the pandas NS is fine. see also http://code.kx.com/wiki/Reference/aj

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2016

Ok taking @chrisaycock suggestion, changed strict -> allow_exact_matches, much more clear.

Here's the new signature.

Signature: pd.asof_merge(left, right, on=None, left_on=None, right_on=None, by=None, suffixes=('_x', '_y'), tolerance=None, allow_exact_matches=True)
Docstring:
Perform an asof merge. This by definition will fill the right data with
the asof the left data. The data must be ordered according to the 'on'
(typically time). Optionally perform group-wise merge.

Asof means return the last good (non-NaN) value in object for the date in
question. This is conceptually very close to a left-outer join with
forward filling.

.. versionadded 0.18.2

Parameters
----------
left: DataFrame
right: DataFrame
    Data will be de-duplicated on the [by, on] combination if needed.
on: label or list
    Field names to join on. Must be found in both DataFrames.
    The data MUST be ordered. Furthermore this must be a datetimelike or
    integer-index.
left_on: label or list, or array-like
    Field names to join on in left DataFrame. Can be a vector or list of
    vectors of the length of the DataFrame to use a particular vector as
    the join key instead of columns
right_on: label or list, or array-like
    Field names to join on in right DataFrame or vector/list of vectors per
    left_on docs
by: column name or list of column names
    Group both the left and right DataFrames by the group columns; perform
    the merge operation on these pieces and recombine.
suffixes: 2-length sequence (tuple, list, ...)
    Suffix to apply to overlapping column names in the left and right
    side, respectively
tolerance: integer or Timedelta, optional, default None
    select asof tolerance within this range, must be compatible
    to the merge index.
allow_exact_matches: boolean, default True
    If True, allow matching the same 'on' value (e.g. same time)
    If False, don't match the same 'on' value
        (e.g. don't match the same time)

Returns
-------
merged : DataFrame

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2016

any other comments?

@chrisaycock
Copy link
Contributor

What about this for the help statement?

Perform an asof merge. This is similar to a left-join except that we match on
nearest key rather than equal keys.

For each row in the left DataFrame, we select the last row in the right DataFrame
whose 'on' key is less than the left's key. Both DataFrames must be sorted by the
key.

Optionally perform group-wise merge. This matches the 'by' key equally, in
addition to the nearest match on the 'on' key.


Parameters
----------
left: DataFrame
right: DataFrame
    Data will be de-duplicated on the [by, on] combination if needed.
on: label or list
    Field names to join on. Must be found in both DataFrames.
    The data MUST be ordered. Furthermore this must be a datetimelike or
    integer-index.
left_on: label or list, or array-like
    Field names to join on in left DataFrame. Can be a vector or list of
    vectors of the length of the DataFrame to use a particular vector as
    the join key instead of columns
right_on: label or list, or array-like
    Field names to join on in right DataFrame or vector/list of vectors per
    left_on docs
by: column name or list of column names
    Group both the left and right DataFrames by the group columns; perform
    the merge operation on these pieces and recombine.
suffixes: 2-length sequence (tuple, list, ...)
    Suffix to apply to overlapping column names in the left and right
    side, respectively
tolerance: integer or Timedelta, optional, default None
    select asof tolerance within this range; must be compatible
    to the merge index.
allow_exact_matches: boolean, default True
    If True, allow matching the same 'on' value (i.e. less-than-or-equal-to)
    If False, don't match the same 'on' value
        (i.e., stricly less-than)

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2016

updated according to @chrisaycock doc-string


Similar to an ``pd.ordered_merge()``, an ``pd.asof_merge()`` allows one to propogate forward
values that are missing. ``pd.asof_merge()`` allows some specific options on this propogation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This propagates forward values that are not missing, right? Also, not sure what the second sentence is saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote basically using the new doc-string.

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 9, 2016

To @jorisvandenbossche's point, how about adding a smaller context-free example that makes it a little easier to see exactly what's going on. Something like this: (edit: changed 'right_val' to numbers so it's easier to match up the rows)

a = pd.DataFrame({'a':[1, 5, 10],
                  'left_val':['a', 'b', 'c']})
b = pd.DataFrame({'a': [1, 2, 3, 6, 7],
                  'right_val': [1, 2, 3, 6, 7],})

pd.asof_merge(a, b, on='a')

It might also be good to show an ordered_merge on the same set of data so you can see how they are different?

@jreback
Copy link
Contributor Author

jreback commented Jun 9, 2016

sure maybe just in the actual docs / or in doc-string as well?

@chris-b1
Copy link
Contributor

chris-b1 commented Jun 9, 2016

yeah, I guess was just thinking about the docs

allow_exact_matches : boolean, default True

- If True, allow matching the same 'on' value
(i.e. less-than-or-equal-to)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you align the (i.e. with If (otherwise sphinx will complain about unexpected indentation). Same for the lines below (and for check_duplicates)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sure. that makes sense.

@jreback jreback force-pushed the asof branch 3 times, most recently from 01ff187 to 4e37577 Compare June 12, 2016 16:20
@jreback
Copy link
Contributor Author

jreback commented Jun 14, 2016

any final issues? @jorisvandenbossche @TomAugspurger


A long-time requested feature has been added thru the ``pd.merge_asof()`` function, to
support asof style joining of time-series. (:issue:`1870`). Full documentaion are
:ref:`here <mergeing.merge_asof>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeing -> merging

@jorisvandenbossche
Copy link
Member

@jreback I added another bunch of doc comments (they are not critical, I can also do a PR later if you want)

But I wanted to come back to the duplicate handling for a moment. In case of only on is specified (so no by), it depends on if it is an 'equal' match or an 'less than' match for how duplicates (in the 'on' key column) in the right dataframe are handled:

In [1]: df1 = pd.DataFrame({'key':[1,1,3], 'left_val':[1,2,3]})
   ...: df2 = pd.DataFrame({'key':[1,3,3], 'right_val':[1,2,3]})
   ...:

In [2]: pd.merge_asof(df1, df2, on='key')
Out[2]:
   key  left_val  right_val
0    1         1          1
1    1         2          1
2    3         3          2
3    3         3          3

In [3]: df1 = pd.DataFrame({'key':[1,1,3], 'left_val':[1,2,3]})
   ...: df2 = pd.DataFrame({'key':[1,2,2], 'right_val':[1,2,3]})

In [4]: pd.merge_asof(df1, df2, on='key')
Out[4]:
   key  left_val  right_val
0    1         1          1
1    1         2          1
2    3         3          3

So in case of an exact match, you get both duplicates (consistent with merge), but if you have a inexact match, only the last row of the duplicates is used. This seems a bit inconsistent to me.

(stressing again that I do not know the financial application of this method, but just look at it based on my understanding of the general concept of an asof join, so I don't know what the 'logical' way is to treat duplicates. I just notice that it is inconsistent with merge/merge_ordered)

ENH: add DataFrame.asof

ENH: add pd.asof_merge

closes pandas-dev#1870
@jreback
Copy link
Contributor Author

jreback commented Jun 15, 2016

your example of the duplicates is correct. This is exactly the purpose of an asof merge, NOT to duplicate (pun!) the point of merge. An asof-merge will get you exactly 1 value for each of the left. That's why duplicates are a bit odd, you normally don't have them.
Your example is not quite right, you are now emulating an inner join (as the how is implicit), but a how='left'.

In [5]: pd.merge_asof(df1, df2, on='key')
Out[5]: 
   key  left_val  right_val
0    1         1          1
1    1         2          1
2    3         3          3

In [6]: pd.merge(df1, df2, on='key', how='left')
Out[6]: 
   key  left_val  right_val
0    1         1        1.0
1    1         2        1.0
2    3         3        NaN

I added a test with your examples in any event (as well as updated the docs).

@jreback
Copy link
Contributor Author

jreback commented Jun 16, 2016

@jorisvandenbossche ready to go.

@jorisvandenbossche
Copy link
Member

(sorry for the late answer)
OK, I see the logic in that (and the consistency with a left merge -> only the NaN is filled), but on the other hand, the "nearest" and "last row" in the docstring ("match on nearest key rather than equal keys." and "we select the last row in the right DataFrame whose 'on' key is less than or equal to the left's key.") does not really apply on the 'equal' case (not the last row, but all rows but equal key are selected), only on the 'less than' case.

Further, I have another question on the duplicate. There is also a difference in handling of duplicates when using by or not (using the same example as above, but with a 'by' column (consisting of one value for simplicity of showing)):

In [1]: df1 = pd.DataFrame({'key':[1,1,3], 'by':['a']*3, 'left_val':[1,2,3]})
   ...: df2 = pd.DataFrame({'key':[1,3,3], 'by':['a']*3, 'right_val':[1,2,3]})
   ...:

In [2]: pd.merge_asof(df1, df2, on='key')
Out[2]:
  by_x  key  left_val by_y  right_val
0    a    1         1    a          1
1    a    1         2    a          1
2    a    3         3    a          2
3    a    3         3    a          3

In [3]: pd.merge_asof(df1, df2, on='key', by='by')
Out[3]:
  by  key  left_val  right_val
0  a    1         1          1
1  a    1         2          1
2  a    3         3          3

So when not using by, duplicate equal keys in the right frame are all used in the result, while when using by, only the last row is used.

@jorisvandenbossche
Copy link
Member

(BTW, it's OK for me to merge this version while I further ask questions, also easier to test when in master and to further clean-up docs)

@jorisvandenbossche
Copy link
Member

Using a slightly other example (combining both cases - duplicate equal and non-equal keys in the right frame - in one dataframe:

In [4]: df1 = pd.DataFrame({'key':[1,1,3,5], 'by':['a']*4, 'left_val':[1,2,3,4]})
   ...: df2 = pd.DataFrame({'key':[1,2,2,5,5], 'by':['a']*5, 'right_val':[1,2,3,4,5]})
   ...:

In [5]: pd.merge_asof(df1, df2, on='key')
Out[5]:
  by_x  key  left_val by_y  right_val
0    a    1         1    a          1
1    a    1         2    a          1
2    a    3         3    a          3
3    a    5         4    a          4
4    a    5         4    a          5

In [6]: pd.merge_asof(df1, df2, on='key', by='by')
Out[6]:
  by  key  left_val  right_val
0  a    1         1          1
1  a    1         2          1
2  a    3         3          3
3  a    5         4          5

In [7]: pd.merge_asof(df1, df2, on='key', by='by', check_duplicates=False)
Out[7]:
  by  key  left_val  right_val
0  a    1         1          1
1  a    1         2          1
2  a    3         3          3
3  a    5         4          4
4  a    5         4          5

So actually (in this case), the duplicate=False (when having duplicates, which is not the purpose of this keyword as following the docs you should use it when you are certain there are no duplicates) gives you similar behaviour as when no by is used.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 16, 2016

@jreback OK, I just see you actually changed this (check_duplicates is now also used without by), sorry, I didn't update my branch for testing ...
So ignore all my comments from above about inconsistency between using by or not!

@jreback jreback closed this in 6d8c04c Jun 17, 2016
@jreback
Copy link
Contributor Author

jreback commented Jun 17, 2016

ok merged, will see about docs and do a followup probably

@jreback jreback mentioned this pull request Jun 17, 2016
2 tasks
jreback added a commit that referenced this pull request Jun 17, 2016
@jreback jreback mentioned this pull request Aug 11, 2016
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ordered_merge improvements
9 participants